Example 1: Applying a Data Pipeline to Multiple DataFrames¶
- This example demonstrates how to use the
DataPipelineclass from thesrc.data.datasetmodule to apply a sequence of transformations to multiple pandas DataFrames. - The pipeline is initialized using a configuration file (
pipeline.yaml), and the transformations are applied to a list of sample DataFrames representing structured data with fields such asradius,volume, andother. - The output displays the original data and the transformed results.
InĀ [1]:
import pandas as pd
from src.data import dataset
# Sample data
data1 = pd.DataFrame({
"radius": [1, 2, 3, 4],
"volume": [50, 150, 30, 200],
"other": [9, 8, 7, 6]
})
data2 = pd.DataFrame({
"radius": [5, 6, 7, 8],
"volume": [80, 90, 120, 40],
"other": [1, 2, 3, 4]
})
data = [data1, data2]
print(data)
# åå§å DataPipeline å¹¶åŗēØę°ę®č½¬ę¢
pipeline = dataset.DataPipeline(config_path="settings/pipeline.yaml")
transformed_data = pipeline.apply(data)
print("Transformed Data:")
print(transformed_data)
2025-01-03 10:10:57,136 - INFO - dataset.py - Loaded the YAML configuration file <_io.TextIOWrapper name='settings/pipeline.yaml' mode='r' encoding='utf-8'>. 2025-01-03 10:10:57,137 - INFO - dataset.py - Building data transformation pipeline. 2025-01-03 10:10:57,137 - INFO - dataset.py - Added step 'choose_columns' to the pipeline. 2025-01-03 10:10:57,138 - INFO - dataset.py - Added step 'delete_outlier_in_volume' to the pipeline. 2025-01-03 10:10:57,139 - INFO - dataset.py - Added step 'concat_dataframes' to the pipeline. 2025-01-03 10:10:57,139 - INFO - dataset.py - Data transformation pipeline built successfully. 2025-01-03 10:10:57,139 - INFO - dataset.py - Applying the data transformation pipeline. 2025-01-03 10:10:57,141 - INFO - dataset.py - Selected columns from 2 DataFrames. 2025-01-03 10:10:57,141 - INFO - dataset.py - Filtered 2 DataFrames. 2025-01-03 10:10:57,142 - INFO - dataset.py - Concatenating DataFrames with join type 'inner'. 2025-01-03 10:10:57,143 - INFO - dataset.py - Concatenated 2 DataFrames into one DataFrame withshape (5, 2). 2025-01-03 10:10:57,143 - INFO - dataset.py - Data transformation pipeline applied successfully.
[ radius volume other 0 1 50 9 1 2 150 8 2 3 30 7 3 4 200 6, radius volume other 0 5 80 1 1 6 90 2 2 7 120 3 3 8 40 4] Transformed Data: radius volume 0 1 50 1 3 30 2 5 80 3 6 90 4 8 40
Example 2: Image Processing Pipeline with Visualization¶
- This example demonstrates how to use the
ImagePipelineclass from thesrc.data.datasetmodule to apply image processing transformations to a set of sample images. - The pipeline is configured via the
image_pipeline.yamlfile and processes a list of mock images (e.g., a white image and a black image). - Additionally, the
ImageVisualizerclass fromsrc.utils.plottersis used to visualize the images before and after processing. - The output includes the shapes of the processed images and their visual representations.
InĀ [2]:
import numpy as np
from src.data import dataset
from src.utils import plotters
# Sample images (mock data for illustration)
image1 = np.ones((100, 100, 3), dtype=np.uint8) * 255 # White image
image2 = np.zeros((100, 100, 3), dtype=np.uint8) # Black image
images = [image1, image2]
visualizer = plotters.ImageVisualizer(width=400, height=400)
visualizer.display_images(images)
# Initialize ImagePipeline and apply image processing
pipeline = dataset.ImagePipeline(config_path="settings/image_pipeline.yaml")
processed_images = [pipeline.apply(img) for img in images]
print("Processed Images:")
for idx, img in enumerate(processed_images):
print(f"Image {idx+1}: Shape={img.shape}")
visualizer.display_images(processed_images)
Displaying Image 1: Shape=(100, 100, 3)
Displaying Image 2: Shape=(100, 100, 3)
2025-01-03 10:10:57,910 - INFO - dataset.py - Building image processing pipeline. 2025-01-03 10:10:57,911 - INFO - dataset.py - Added step 'resize' to the pipeline. 2025-01-03 10:10:57,912 - INFO - dataset.py - Added step 'normalize' to the pipeline. 2025-01-03 10:10:57,913 - INFO - dataset.py - Added step 'convert_color' to the pipeline. 2025-01-03 10:10:57,913 - INFO - dataset.py - Image processing pipeline built successfully. 2025-01-03 10:10:57,914 - INFO - dataset.py - Applying the image processing pipeline. 2025-01-03 10:10:57,915 - INFO - dataset.py - Resized image to 100x100. 2025-01-03 10:10:57,916 - INFO - dataset.py - Normalized image. 2025-01-03 10:10:57,916 - INFO - dataset.py - Converted image color using code 6. 2025-01-03 10:10:57,917 - INFO - dataset.py - Image processing pipeline applied successfully. 2025-01-03 10:10:57,917 - INFO - dataset.py - Applying the image processing pipeline. 2025-01-03 10:10:57,918 - INFO - dataset.py - Resized image to 100x100. 2025-01-03 10:10:57,919 - INFO - dataset.py - Normalized image. 2025-01-03 10:10:57,919 - INFO - dataset.py - Converted image color using code 6. 2025-01-03 10:10:57,920 - INFO - dataset.py - Image processing pipeline applied successfully.
Processed Images: Image 1: Shape=(100, 100) Image 2: Shape=(100, 100) Displaying Image 1: Shape=(100, 100)
Displaying Image 2: Shape=(100, 100)
Save as html file¶
InĀ [6]:
import plotly
plotly.offline.init_notebook_mode()
InĀ [7]:
%%time
## Save as html file
from src.utils import utils
source_notebook = "./notebooks/01_process_pipeline_dataframe.ipynb"
target_folder = "./docs/reports"
converter = utils.get_notebook_converter(
source_notebook, target_folder, additional_pdf=False
)
converter.convert()
last save time: 2025-01-03 11:57:42.451276 Converting Notebook file: ./notebooks/01_process_pipeline_dataframe.ipynb HTML with embedded Plotly figures saved to docs\reports\01_process_pipeline_dataframe.html CPU times: total: 297 ms Wall time: 350 ms
d:\Foester\initial preparation\pipeline\.venv\share\jupyter\nbconvert\templates\base\display_priority.j2:32: UserWarning: Your element with mimetype(s) dict_keys(['application/vnd.plotly.v1+json']) is not able to be represented.